<!DOCTYPE html>
<html>

<head>
  <meta charset="utf-8">
  <title>Peter Kocsis</title>
  <meta content="width=device-width, initial-scale=1.0" name="viewport">
  <meta content="Deep Learning, Diffusion, Computer Graphics, Inverse Rendering, Appearance Decomposition" name="keywords">
  <meta name="description" content="Intrinsic Image Diffusion for Indoor Single-view Material Estimation">
  <meta name="author" content="Peter Kocsis, Vincent Sitzmann, Matthias Niessner">
  <!-- Favicon -->
  <link href="static/favicon.ico" rel="icon">
  <!-- Google Web Fonts -->
  <link rel="preconnect" href="https://fonts.googleapis.com">
  <link rel="preconnect" href="https://fonts.gstatic.com" crossorigin="">
  <link href="https://fonts.googleapis.com/css2?family=Open+Sans:wght@400;500;600;700;800&amp;display=swap" rel="stylesheet">
  <!-- Icon Font Stylesheet -->
  <link href="https://cdnjs.cloudflare.com/ajax/libs/font-awesome/5.10.0/css/all.min.css" rel="stylesheet">
  <link href="https://cdn.jsdelivr.net/npm/bootstrap-icons@1.4.1/font/bootstrap-icons.css" rel="stylesheet">
  <!-- Libraries Stylesheet -->
  <link href="lib/animate/animate.css" rel="stylesheet">
  <link href="lib/owlcarousel/assets/owl.carousel.min.css" rel="stylesheet">
  <!-- Customized Bootstrap Stylesheet -->
  <link href="css/bootstrap.min.css" rel="stylesheet">
  <!-- Stylesheet -->
  <link href="css/style.css" rel="stylesheet">
  <!-- Font awesome -->
  <link rel="stylesheet" href="https://cdnjs.cloudflare.com/ajax/libs/font-awesome/4.7.0/css/font-awesome.min.css">
  <!-- Slideshow style -->
  <link rel="stylesheet" href="lib/slideshow/slideshow.css">

</head>

<body data-bs-spy="scroll" data-bs-target=".navbar" data-bs-offset="51">
  <!-- Header Start -->
  <div class="container-fluid bg-light my-6 mt-0 mb-1" id="home">
    <div class="container">
      <div class="row justify-content-center">
        <h2 class="display-32 text-center mt-5 mb-3">Intrinsic Image Diffusion for Indoor Single-view Material Estimation</h2>
        <h4 class="display-32 text-center">CVPR 2024</h4>
      </div>
      <div class="row justify-content-center mt-3 mb-1">
        <div class="col-lg-2">
          <a href="https://peter-kocsis.github.io/" title="Peter Kocsis"><h5 class="display-32 text-center">Peter Kocsis</h5></a>
          <h6 class="display-32 text-center">TU Munich</h6>
        </div>
        <div class="col-lg-2">
          <a href="https://www.vincentsitzmann.com/" title="Vincent Sitzmann"><h5 class="display-32 text-center">Vincent Sitzmann</h5></a>
          <h6 class="display-32 text-center">MIT EECS</h6>
        </div>
        <div class="col-lg-2">
          <a href="https://niessnerlab.org/members/matthias_niessner/profile.html" title="Matthias Niessner"><h5 class="display-32 text-center">Matthias Nie&szlig;ner</h5></a>
          <h6 class="display-32 text-center">TU Munich</h6>
        </div>
      </div>
      <div class="row justify-content-center mb-1">
        <div class="col-lg-2 text-center">
          <a href="https://arxiv.org/abs/2312.12274" title="Paper"><i class="bi bi-file-earmark-text" style="font-size: 2em"></i><br>Paper</a></div>
        <div class="col-lg-2 text-center">
          <a href="https://github.com/Peter-Kocsis/IntrinsicImageDiffusion" title="Code"><i class="bi bi-github" style="font-size: 2em"></i><br>Code</a></div>
        <div class="col-lg-2 text-center">
          <a href="./results/results.html" title="Results"><i class="bi bi-file-earmark-break" style="font-size: 2em"></i><br>Results</a></div>
      </div>
    </div>
  </div>
  <!-- Header End -->
  <!-- Teaser Start -->
  <div class="container">
    <div class="row justify-content-center">
      <div class="video-container" style="text-align:center">
        <video autoplay="" muted="" loop="" width="100%" height="auto" style="pointer-events: none;">
            <source src="./static/teaser.mp4" type="video/mp4">
        </video>
      </div>
    </div>
  </div>
  <!-- Teaser End -->
  <!-- Abstract Start -->
  <div class="container-xxl" id="abstract">
    <div class="container">
      <div class="row wow fadeInLeft" data-wow-delay="0.1s">
        <h1 class="display-32" style="padding:0">Abstract</h1>
        We present Intrinsic Image Diffusion, a generative model for appearance decomposition of indoor scenes.
        Given a single input view, we sample multiple possible material explanations represented as albedo, roughness, and metallic maps.
        Appearance decomposition poses a considerable challenge in computer vision due to the inherent ambiguity between lighting and material properties and the lack of real datasets.
        To address this issue, we advocate for a probabilistic formulation, where instead of attempting to directly predict the true material properties, we employ a conditional generative model to sample from the solution space.
        Furthermore, we show that utilizing the strong learned prior of recent diffusion models trained on large-scale real-world images can be adapted to material estimation and highly improves the generalization to real images.
        Our method produces significantly sharper, more consistent, and more detailed materials, outperforming state-of-the-art methods by 1.5dB on PSNR and by 45% better FID score on albedo prediction.
        We demonstrate the effectiveness of our approach through experiments on both synthetic and real-world datasets.
      </div>
    </div>
  </div>
  <!-- Abstract End -->
  <!-- Video Start -->
  <div class="container-xxl">
    <div class="row justify-content-center mb-1 mt-4">
      <div class="video-container" style="text-align:center">
        <iframe class='video' width='720' height='405' src="https://www.youtube.com/embed/lz0meJlj5cA" frameborder="0"
                        allow="accelerometer; autoplay; encrypted-media; gyroscope; picture-in-picture"
                        allowfullscreen>
        </iframe>
      </div>
    </div>
  </div>
  <!-- Video End -->

  <!-- Results Start -->
  <div class="container-xxl mt-5" id="method">
    <div class="container">
      <div class="row wow fadeInRight" data-wow-delay="0.5s">
        <h1 class="display-32" style="padding:0">Results</h1>

        <!-- Slideshow container -->
        <div class="slideshow-container">

          <!-- Full-width images with number and caption text -->
          <div class="mySlides">
            <img src="static/baseline_comparison_0.jpg" style="width:100%">
          </div>

          <div class="mySlides">
            <img src="static/baseline_comparison_1.jpg" style="width:100%">
          </div>

          <div class="mySlides">
            <img src="static/baseline_comparison_2.jpg" style="width:100%">
          </div>

          <div class="mySlides">
            <img src="static/baseline_comparison_3.jpg" style="width:100%">
          </div>

          <!-- Next and previous buttons -->
          <a class="prev" onclick="plusSlides(-1)">&#10094;</a>
          <a class="next" onclick="plusSlides(1)">&#10095;</a>
        </div>
        <br>

        <!-- The dots/circles -->
        <div style="text-align:center">
          <span class="dot" onclick="currentSlide(1)"></span>
          <span class="dot" onclick="currentSlide(2)"></span>
          <span class="dot" onclick="currentSlide(3)"></span>
          <span class="dot" onclick="currentSlide(4)"></span>
        </div>
        
      </div>
    </div>
  </div>
  <!-- Results End -->

  <!-- Applications Start -->
  <div class="container-xxl mt-5" id="method">
    <div class="container">
      <div class="row wow fadeInLeft" data-wow-delay="0.5s">
        <h1 class="display-32" style="padding:0">Applications</h1>

        <h2 class="display-32 mt-3" style="padding:0">Material Editing</h2>
        <div class="row justify-content-center">
          <div class="video-container" style="text-align:center">
            <video autoplay="" muted="" loop="" width="100%" height="auto" style="pointer-events: none;">
                <source src="./static/material_editing.mp4" type="video/mp4">
            </video>
          </div>
        </div>

        <h2 class="display-32 mt-3" style="padding:0">Lighting Editing</h2>
        <div class="row justify-content-center">
          <div class="video-container" style="text-align:center">
            <video autoplay="" muted="" loop="" width="100%" height="auto" style="pointer-events: none;">
                <source src="./static/light_editing.mp4" type="video/mp4">
            </video>
          </div>
        </div>
      </div>
    </div>
  </div>
  <!-- Applications End -->

  <!-- Method Start -->
  <div class="container-xxl mt-5" id="method">
    <div class="container">
      <div class="row wow fadeInRight" data-wow-delay="0.5s">
        <h1 class="display-32" style="padding:0">Method</h1>

        <h2 class="display-32 mt-3" style="padding:0">Material Diffusion</h2>
        <img src="static/pipeline.jpg" style="padding:0" alt="Material Diffusion" />
          We train a conditional diffusion model to predict albedo and BRDF properties (roughness and metallic) given
          a single input image. We adapt the learned prior of Stable Diffusion [28] by fine-tuning it on the synthetic InteriorVerse [40] dataset. (i)
          First, we separately encode the ground-truth (GT) albedo and BRDF properties with a fixed encoder to obtain the material feature maps.
          We also encode the conditioning image with a trainable encoder. (ii) We add noise to the material features and use our conditional diffusion
          model to predicted the noise. (iii) The training is supervised with L2 loss between the original and predicted noise. (iv) Using the predicted
          noise, the predicted material properties can be decoded separately.

        <h2 class="display-32 mt-3" style="padding:0">Lighting Optimization</h2>
        <div class="container justify-content-center" style="text-align:center">
          <img src="static/lighting_optimization.jpg" alt="Lighting Optimization" style="width:50%"/>
        </div>
        Using our predicted material, we fit 48 point light sources and a global pre-integrated environment lighting to the scene using a reconstruction loss.
      </div>
    </div>
  </div>
  <!-- Method End -->

  <!-- Citation Start -->
  <div class="container-fluid bg-light mt-5" id="citation">
    <div class="container">
      <div class="row justify-content-center">
        <h1 class="display-32">Citation</h1>
        <pre><code>
          @inproceedings{kocsis2024iid,
              author = {Kocsis, Peter and Sitzmann, Vincent and Nie\{ss}ner, Matthias},
              title = {Intrinsic Image Diffusion for Indoor Single-view Material Estimation},
              journal = {Conference on Computer Vision and Pattern Recognition (CVPR)},
              year = {2024}}
        </code></pre>
      </div>
    </div>
  </div>
  <!-- Citation End -->

  <!-- JavaScript Libraries -->
  <script src="https://code.jquery.com/jquery-3.4.1.min.js"></script>
  <script src="https://cdn.jsdelivr.net/npm/bootstrap@5.0.0/dist/js/bootstrap.bundle.min.js"></script>
  <script src="lib/wow/wow.js"></script>
  <!-- Template Javascript -->
  <script src="js/main.js"></script>
  <!-- Slideshow Javascript -->
  <script src="lib/slideshow/slideshow.js"></script>
</body>

</html>
